**A Framework for Generating Dynamic, Self-Healing Educational Plans using a Hybrid A*/RL Approach**

Executive Summary

The creation of truly personalized education at scale represents a paramount challenge in modern learning technology. The one-size-fits-all model of instruction fails to account for the vast individual differences in prior knowledge, learning pace, and cognitive styles.1 Intelligent Tutoring Systems (ITS) have emerged as a powerful paradigm to address this, leveraging artificial intelligence to deliver tailored educational experiences that adapt to each learner's unique journey.2 This report presents a comprehensive technical framework for the design and implementation of an advanced ITS capable of generating and maintaining bespoke, individualized education plans.
The proposed system is architected around a hybrid algorithmic core that synergistically combines the strengths of structured pathfinding with dynamic, real-time adaptation. The core challenge is twofold: first, to generate a coherent, long-term learning plan that is optimally sequenced and efficient; second, to continuously adjust this plan in response to the learner's moment-to-moment performance, creating a "self-healing" path that remediates struggles and accelerates mastery.
To solve this, our framework is built upon three foundational pillars derived from established principles in computer science and education theory:

A Semantically Rich Domain Model: A knowledge graph serves as the system's understanding of the subject matter, representing concepts and the vast array of learning resources (flashcards, videos, quizzes, exercises) as nodes, with prerequisite and semantic relationships as weighted edges.4 This provides the structured "map" upon which learning paths are charted.
A Probabilistic Student Model: Moving beyond simplistic score-tracking, the system employs Bayesian Knowledge Tracing (BKT), a hidden Markov model, to maintain a probabilistic estimate of the learner's mastery of each concept.6 This model of the learner's latent knowledge state is crucial for making nuanced pedagogical decisions under the inherent uncertainty of the learning process.
A Hybrid Tutoring Model: The engine for plan generation and adaptation consists of two layers. A global A* Search algorithm first generates an optimal, long-term learning path through the knowledge graph, minimizing a multi-criteria cost function that accounts for resource difficulty, time, and pedagogical quality.7 Then, a local
Reinforcement Learning (RL) agent, modeled as a Deep Q-Network (DQN), acts as a policy wrapper around this plan. At each step, the RL agent observes the student's probabilistic knowledge state and decides the most effective pedagogical action—whether to proceed with the plan, substitute a resource, remediate a prerequisite, or skip ahead—thereby maximizing long-term learning gain.8

This report provides a complete blueprint for this system, including the high-level architecture, the detailed mathematical formulations of BKT and A*, a full production-ready implementation of the core pathfinding algorithm in TypeScript, a data-driven framework for weighting educational resources, and evidence-based recommendations for designing synergistic educational content. The result is a robust, theoretically grounded, and practically implementable framework for creating learning experiences that are not merely personalized, but perpetually adaptive and self-correcting.

Section 1: An Architectural Framework for Intelligent Tutoring

The development of a sophisticated adaptive learning system necessitates a structured and modular architecture. A monolithic approach, where a single algorithm attempts to manage all aspects of the pedagogical interaction, is brittle and difficult to scale. Instead, a more robust design is achieved by adopting the well-established architectural patterns of Intelligent Tutoring Systems (ITS). This approach decomposes the complex problem of teaching into distinct, interacting components, each with a clearly defined responsibility. This separation of concerns is not merely an academic exercise; it is a practical blueprint for building maintainable, extensible, and pedagogically sound educational software.3

1.1. The Core Components of an Adaptive System: The ITS Triad

The most widely accepted framework for ITS architecture comprises four core components: the Domain Model, the Student Model, the Tutoring Model, and the User Interface Model.3 While the User Interface is critical for delivering the learning experience, its design is beyond the scope of this algorithmic report. Our focus lies on the three computational models that constitute the system's "intelligence."

The Domain Model (What to Teach): This component encapsulates all the knowledge and skills related to the subject matter. It acts as the system's expert, providing the ground truth against which student performance is measured.11
The Student Model (Who is Being Taught): This component tracks and assesses the learner's evolving state of knowledge, skills, misconceptions, and even affective states like motivation or confusion.12 It is a dynamic representation of the individual learner.
The Tutoring Model (How to Teach): This is the pedagogical engine of the system. It uses information from the Domain and Student models to make instructional decisions, such as selecting the next problem, providing feedback, or adjusting the learning path.3

This tripartite architecture provides a logical mapping for the system's required inputs and functions. The user-provided knowledge graph directly implements the Domain Model. The historical and real-time performance data feeds the Student Model. The core algorithm for plan generation and adaptation constitutes the Tutoring Model. This modularity ensures that improvements to one component—such as refining the knowledge graph or enhancing the student modeling technique—can be made without requiring a complete overhaul of the entire system.

1.2. The Domain Model: A Semantically Rich Knowledge Graph

The foundation of any adaptive learning system is its representation of the knowledge domain. A simple list of topics is insufficient as it fails to capture the intricate web of relationships that define expertise. A knowledge graph provides a far more powerful and flexible representation, structuring scattered knowledge into a coherent whole.1
Function and Structure
The primary function of the knowledge graph is to serve as the Domain Model, providing a machine-readable representation of the curriculum. It goes beyond a simple table of contents by explicitly modeling the inherent semantic connections between concepts and the educational resources that teach them.4 This semantic integration is vital for navigating the vast and often heterogeneous landscape of learning materials available on a modern platform.5
The graph is a directed, weighted graph, G=(V,E), where:

Vertices (V): The nodes in the graph represent the fundamental units of the curriculum. These are heterogeneous and can be categorized into two main types:
- Concept Nodes: Abstract representations of a skill or knowledge component (e.g., "Linear Regression," "Verb Conjugation").
- Learning Resource (LR) Nodes: Concrete educational materials that students interact with (e.g., a specific video lesson, a flashcard deck, a quiz, an interactive exercise). Each LR node is linked to the concept node(s) it teaches or assesses.
Edges (E): The directed edges represent relationships between nodes. The type of relationship is critical for pedagogical reasoning. Key edge types include:
- is_prerequisite_for(Concept_A, Concept_B)
- explains(Video_1, Concept_A)
- tests(Quiz_1, Concept_A)
- is_part_of(Concept_A, Topic_X)
Weights: Each edge connecting to a Learning Resource node has a weight, w(LR), representing the "cost" of completing that resource. This cost is not a single value but a composite measure derived from multiple criteria, including difficulty, time, and pedagogical quality, as will be detailed in Section 4.2.

Implementation in TypeScript
For practical implementation, an adjacency list representation is superior to an adjacency matrix for the typically sparse nature of knowledge graphs.13 A
Map provides efficient O(1) average time complexity for node lookups. The following TypeScript interfaces and classes provide a robust foundation for the knowledge graph.13

TypeScript

// --- Interfaces for Data Payloads ---

// Represents any item in the graph: a concept or a resource  
export interface IGraphNodeData {  
  id: string; // Unique identifier  
  type: 'Concept' | 'LearningResource';  
  title: string;  
  // Other metadata...  
}

// Specific data for a Learning Resource  
export interface ILearningResourceData extends IGraphNodeData {  
  type: 'LearningResource';  
  resourceType: 'Flashcard' | 'Video' | 'Quiz' | 'Exercise';  
  communityComplexity: number; // e.g., 1-5 scale  
  // More metadata like URL, duration, etc.  
}

// Represents the relationship between two nodes  
export interface IEdge {  
  destinationId: string;  
  relationship: 'prerequisite' | 'explains' | 'tests' | 'related';  
  weight: number; // The cost, primarily for edges leading to LearningResources  
}

// --- Core Graph Data Structures ---

// Represents a node in the graph  
export class GraphNode {  
  public data: IGraphNodeData | ILearningResourceData;  
  public edges: IEdge;

  constructor(data: IGraphNodeData | ILearningResourceData) {  
    this.data = data;  
    this.edges \=;  
  }

  public addEdge(destinationId: string, relationship: IEdge['relationship'], weight: number = 1): void {  
    const edge: IEdge = { destinationId, relationship, weight };  
    this.edges.push(edge);  
  }  
}

// The main graph class using an adjacency list representation  
export class KnowledgeGraph {  
  private nodes: Map<string, GraphNode\>;

  constructor() {  
    this.nodes = new Map<string, GraphNode\>();  
  }

  public addNode(data: IGraphNodeData | ILearningResourceData): GraphNode {  
    if (this.nodes.has(data.id)) {  
      return this.nodes.get(data.id)!;  
    }  
    const newNode = new GraphNode(data);  
    this.nodes.set(data.id, newNode);  
    return newNode;  
  }

  public getNode(id: string): GraphNode | undefined {  
    return this.nodes.get(id);  
  }

  public addEdge(sourceId: string, destinationId: string, relationship: IEdge['relationship'], weight: number = 1): void {  
    const sourceNode = this.getNode(sourceId);  
    const destinationNode = this.getNode(destinationId);

    if (sourceNode && destinationNode) {  
      sourceNode.addEdge(destinationId, relationship, weight);  
    } else {  
      throw new Error('Source or destination node not found in the graph.');  
    }  
  }  
}

1.3. The Student Model: Probabilistic Knowledge Tracing

To adapt effectively, the system needs to know more than just whether the last answer was correct. It must infer the learner's underlying, unobservable knowledge state. This is the central task of the Student Model. The process of modeling this latent knowledge over time is known as Knowledge Tracing (KT).14 KT is a core feature of most modern adaptive learning systems, as it provides the necessary input for making pedagogical decisions.15
Chosen Model: Bayesian Knowledge Tracing (BKT)
While various KT models exist, including complex deep learning approaches, we propose using Bayesian Knowledge Tracing (BKT). BKT is a specific type of Hidden Markov Model (HMM) that has been successfully used in numerous ITSs.6 It is chosen for its combination of predictive power, computational efficiency, and, crucially, its interpretable, psychologically meaningful parameters.17
BKT models the student's knowledge of a single skill or concept as a binary latent variable: the student has either mastered the skill or they have not. The system observes the student's actions (e.g., correct or incorrect answers) and uses these observations to update the probability that the student has mastered the skill.
For each concept k in the knowledge graph, BKT relies on four key parameters 6:

p(L0)k: The probability of prior knowledge. This is the initial probability that the student knows skill k before any interaction.
p(T)k: The probability of transit. This is the probability that the student will learn the skill (transition from the "unmastered" to the "mastered" state) after a single opportunity to apply it.
p(G)k: The probability of guess. This is the probability that a student who has not mastered the skill will nevertheless answer a question correctly.
p(S)k: The probability of slip. This is the probability that a student who has mastered the skill will make a mistake and answer a question incorrectly.

Mathematical Formulation of BKT
The core of BKT is a set of equations that update the probability of mastery, p(Ln), after the n-th interaction. Let p(Ln) be the probability that the student knows the skill just before the n-th opportunity.

Initial State: The probability of knowing the skill before the first opportunity is simply the prior knowledge parameter.

p(L1)=p(L0)
Posterior Probability Update (after observation): After the student provides an answer (observation obs at time n), we update our belief using Bayes' theorem. This results in the posterior probability of the student having known the skill given their answer.
If the answer is correct (obs = correct), the probability that the student knew the skill is:

p(Ln∣obsn=correct)=p(Ln)⋅(1−p(S))+(1−p(Ln))⋅p(G)p(Ln)⋅(1−p(S))

The numerator represents the probability of knowing the skill and not slipping. The denominator is the total probability of getting a correct answer (either by knowing and not slipping, or by not knowing and guessing).
If the answer is incorrect (obs = wrong), the probability that the student knew the skill is:

p(Ln∣obsn=wrong)=p(Ln)⋅p(S)+(1−p(Ln))⋅(1−p(G))p(Ln)⋅p(S)

The numerator is the probability of knowing the skill but slipping. The denominator is the total probability of an incorrect answer.
Applying the Learning Transit: After updating our belief based on the observation, we account for the possibility of learning. The student might have learned the skill during the interaction itself. The probability of knowing the skill for the next opportunity, p(Ln+1), is the probability that they knew it before (given the observation) OR they didn't know it but learned it.

p(Ln+1)=p(Ln∣obsn)+(1−p(Ln∣obsn))⋅p(T)
Predicting Future Performance: The model can also be used to predict the probability of a correct answer on the next question.

p(Correctn+1)=p(Ln+1)⋅(1−p(S))+(1−p(Ln+1))⋅p(G)

This probabilistic approach is fundamentally more powerful than simply tracking scores. It explicitly models the uncertainties inherent in assessment, providing a much richer and more accurate signal to the Tutoring Model about the learner's true competence.

1.4. The Tutoring Model: A Hybrid Planning and Adaptation Engine

The Tutoring Model is the system's brain, responsible for making all pedagogical decisions. The central challenge it faces is balancing two competing needs: the need for a long-term, coherent plan and the need for immediate, context-sensitive adaptation.2 A static curriculum provides structure but cannot respond to individual needs. A purely reactive system can adapt moment-to-moment but may lack a clear, efficient path toward a long-term goal.19
To resolve this tension, we propose a two-layered, hybrid Tutoring Model:

Global Path Planner (A* Search): This layer is responsible for the initial generation of the learning plan. It treats the problem as a pathfinding task on the knowledge graph. Using the A* search algorithm, it computes a globally optimal sequence of learning resources to take the learner from their current knowledge state to their desired learning objective. This provides the foundational structure and ensures the plan is efficient and comprehensive. This component will be detailed in Section 2.
Local Adaptation Agent (Reinforcement Learning): This layer acts as an intelligent supervisor, executing the plan generated by the A* planner. At each step in the plan, this agent observes the learner's current state (as provided by the BKT Student Model) and decides on the best immediate pedagogical action. This action might be to follow the plan, but it could also be to substitute an easier resource, insert a remedial activity, or even skip a topic the student has already mastered. This agent uses Reinforcement Learning (RL) to learn a policy that maximizes long-term student learning. This provides the dynamic, "self-healing" capability. This component will be detailed in Section 3.

This hybrid architecture mirrors the behavior of an expert human tutor, who typically starts with a well-structured lesson plan (the A* path) but remains flexible, ready to deviate and improvise based on the student's real-time feedback and comprehension (the RL agent's policy).10 The synergy between these two layers allows the system to be both strategic and tactical, providing a learning experience that is both structured and deeply personalized.
The intelligence of such a system is not located in any single algorithm but is an emergent property of the interaction between these specialized models. The Domain Model provides the map, the Student Model tracks the learner's location and heading on that map, and the hybrid Tutoring Model acts as the navigator, charting the initial course and making real-time adjustments for traffic and terrain. The entire "self-healing" and adaptive capability of the system hinges on the quality and interplay of these components. A probabilistic student model, in particular, is non-negotiable for true adaptation. Decisions made by the Tutoring Model would be suboptimal if based on noisy, deterministic signals like raw scores instead of the richer, probabilistic state provided by BKT. Therefore, implementing a robust BKT model is a prerequisite for any effective adaptation.

**Section 2: The Learning Path Generation Algorithm: A* Search**

The initial creation of a personalized learning plan is a foundational requirement of the system. This task can be formally modeled as an optimal pathfinding problem within the structured environment of the knowledge graph. The goal is to construct a sequence of learning resources that efficiently and effectively guides a learner from their current state of knowledge to their desired learning objective. For this task, the A* search algorithm is exceptionally well-suited due to its efficiency and guarantee of optimality.7

2.1. Problem Formulation: Optimal Pathfinding in a Weighted Knowledge Graph

We define the generation of a learning plan as the problem of finding the "shortest" path between a start node and a goal node in the Domain Model's knowledge graph.

Start Node: This is not a single node but a conceptual representation of the learner's current knowledge frontier, determined by the set of concepts for which their mastery probability p(L) (from the BKT model) is below a target threshold.
Goal Node: This is the target concept or set of concepts specified in the user's learning objective.
Path: A path is a sequence of Learning Resource (LR) nodes that connects the start state to the goal state, respecting the prerequisite relationships defined by the graph's edges.
"Shortest" Path: In this context, "shortest" does not mean the fewest number of nodes. It refers to the path with the minimum cumulative "cost," where the cost of traversing an edge to an LR node reflects the total effort (e.g., time, cognitive load, difficulty) required to complete that resource.20 The precise calculation of this cost is a multi-criteria function detailed in Section 4.2.

While other algorithms like Dijkstra's can find the shortest path, A* is an informed search algorithm. It uses a heuristic function to intelligently guide its search toward the goal, allowing it to explore far fewer nodes than an uninformed search, making it significantly more efficient on large, complex graphs like a comprehensive knowledge graph.21

**2.2. The A* Algorithm: Mathematical Formulation for Education**

The A* algorithm works by exploring the graph and, at each node n, evaluating it based on the function 22:
f(n)=g(n)+h(n)
The algorithm maintains a priority queue of nodes to visit (the "open list"), always choosing to expand the node with the lowest f(n) score.

g(n): The Cost Function (Past Cost)
g(n) represents the exact, known cost of the path from the starting node to the current node n. In our educational context, this is the cumulative learning effort expended so far. It is calculated by summing the weights, w(LRi), of all learning resources traversed in the path from the start to n.

g(n)=i∈path to n∑w(LRi)

The weight w(LRi) is a carefully constructed value reflecting the resource's difficulty, required time, and pedagogical quality, as will be defined in Section 4.2.
h(n): The Heuristic Function (Estimated Future Cost)
h(n) is the heuristic, an "educated guess" of the cost of the cheapest path from node n to the goal. The quality of the heuristic is critical to A*'s performance. For A* to be optimal (i.e., guaranteed to find the true shortest path), the heuristic must be admissible, meaning it never overestimates the actual cost to reach the goal.7
A suitable admissible heuristic for our knowledge graph can be derived from a "relaxed" version of the problem. We can estimate the remaining cost by considering the number of prerequisite concepts between the current node n and the goal, multiplied by an average resource cost. For example, using Manhattan distance if the graph can be mapped to a grid-like structure, or more simply:

h(n)=(Distanceconcepts(n,goal))×wˉresource

where Distanceconcepts(n,goal) is the number of concept nodes on the shortest unweighted path from n to the goal, and wˉresource is the average weight of all learning resources in the system. This heuristic is admissible because it assumes an ideal scenario where every concept can be learned with an average-cost resource, which is unlikely to be more expensive than the actual cheapest path.
The design of this heuristic function is a crucial tuning parameter. It effectively encodes the system's pedagogical "optimism." A higher h(n) value (more optimistic) makes the algorithm more "greedy," favoring paths that appear to head directly to the goal. A lower h(n) value makes the algorithm more cautious and exploratory, behaving more like Dijkstra's algorithm.22 This allows for the implementation of different strategic planning styles, such as a "remedial" path that explores more foundational options versus an "accelerated" path that prioritizes directness.

Algorithm Pseudocode
The A* algorithm proceeds as follows 21:

Initialize two sets: openList (a priority queue) and closedList (a hash set for fast lookups).
Create a starting node and add it to openList.
While openList is not empty:
a. Let currentNode be the node in openList with the lowest f score.
b. Remove currentNode from openList and add it to closedList.
c. If currentNode is the goal node, reconstruct the path by following parent pointers back to the start and return the path.
d. For each neighbor of currentNode:
i. If neighbor is in closedList, skip it.
ii. Calculate the tentative g score for the path to neighbor through currentNode: tentative_gScore = g(currentNode) + weight(currentNode, neighbor).
iii. If neighbor is not in openList OR tentative_gScore < g(neighbor):
- This is a better path. Record it.
- Set parent(neighbor) = currentNode.
- Set g(neighbor) = tentative_gScore.
- Set f(neighbor) = g(neighbor) + h(neighbor).
- If neighbor is not in openList, add it.
If openList is empty and the goal was not reached, return failure (no path exists).

**2.3. TypeScript Implementation of the A* Path Planner**

The following is a full, production-oriented TypeScript implementation of the A* pathfinding algorithm, tailored for use with the KnowledgeGraph data structures defined in Section 1.2. TypeScript's static typing is invaluable here, ensuring that the complex data objects for nodes, paths, and costs are handled correctly, reducing runtime errors and improving maintainability.24 The implementation requires a
PriorityQueue data structure, for which a min-heap is the most efficient choice.7
Priority Queue (Min-Heap) Implementation
A min-heap is essential for efficiently managing the openList, always allowing for quick retrieval of the node with the minimum f score.

TypeScript

// A generic Priority Queue implementation using a Min-Heap  
// It stores items and prioritizes them based on a provided score.

interface IQueueItem<T> {  
  item: T;  
  priority: number;  
}

export class PriorityQueue<T> {  
  private heap: IQueueItem<T> \=;

  public enqueue(item: T, priority: number): void {  
    this.heap.push({ item, priority });  
    this.bubbleUp(this.heap.length - 1);  
  }

  public dequeue(): T | undefined {  
    if (this.isEmpty()) {  
      return undefined;  
    }  
    this.swap(0, this.heap.length - 1);  
    const { item } = this.heap.pop()!;  
    if (!this.isEmpty()) {  
      this.sinkDown(0);  
    }  
    return item;  
  }

  public isEmpty(): boolean {  
    return this.heap.length === 0;  
  }

  private bubbleUp(index: number): void {  
    while (index > 0) {  
      const parentIndex = Math.floor((index - 1) / 2);  
      if (this.heap[parentIndex].priority <= this.heap[index].priority) {  
        break;  
      }  
      this.swap(index, parentIndex);  
      index = parentIndex;  
    }  
  }

  private sinkDown(index: number): void {  
    const leftChild = 2 * index + 1;  
    const rightChild = 2 * index + 2;  
    let smallest = index;

    if (leftChild < this.heap.length && this.heap[leftChild].priority < this.heap[smallest].priority) {  
      smallest = leftChild;  
    }  
    if (rightChild < this.heap.length && this.heap[rightChild].priority < this.heap[smallest].priority) {  
      smallest = rightChild;  
    }

    if (smallest!== index) {  
      this.swap(index, smallest);  
      this.sinkDown(smallest);  
    }  
  }

  private swap(i: number, j: number): void {  
    [this.heap[i], this.heap[j]] = [this.heap[j], this.heap[i]];  
  }  
}

A* Search Function Implementation
This function integrates the KnowledgeGraph and PriorityQueue to execute the pathfinding logic. It is structured to be generic and reusable, drawing from best practices seen in established open-source libraries.27

TypeScript

// Import the graph structures from Section 1.2  
import { KnowledgeGraph, GraphNode } from './KnowledgeGraph';  
import { PriorityQueue } from './PriorityQueue';

// A-Star specific node data structure to store pathfinding metadata  
interface AStarNode {  
  node: GraphNode;  
  gScore: number; // Cost from start to this node  
  fScore: number; // gScore + hScore  
  parent: AStarNode | null;  
}

// The main A\* search function  
export function aStarSearch(  
  graph: KnowledgeGraph,  
  startId: string,  
  goalId: string  
): GraphNode | null {

  const startNode = graph.getNode(startId);  
  const goalNode = graph.getNode(goalId);

  if (!startNode ||!goalNode) {  
    console.error("Start or Goal node not found in the graph.");  
    return null;  
  }

  // Heuristic function (h-score): Manhattan distance for simplicity  
  // In a real system, this would be more sophisticated (see Section 2.2)  
  const heuristic = (nodeA: GraphNode, nodeB: GraphNode): number \=\> {  
    // This is a placeholder. A real implementation needs a way to calculate  
    // conceptual distance, perhaps from pre-calculated graph embeddings or layout.  
    // For now, we return a simple placeholder value.  
    const dx = Math.abs((nodeA.data as any).x - (nodeB.data as any).x) |

| 0;  
    const dy = Math.abs((nodeA.data as any).y - (nodeB.data as any).y) |

| 0;  
    return dx + dy;  
  };

  const openList = new PriorityQueue<AStarNode\>();  
  const closedList = new Set<string\>();

  // Store metadata for each node ID  
  const aStarNodeMap = new Map<string, AStarNode\>();

  const startAStarNode: AStarNode = {  
    node: startNode,  
    gScore: 0,  
    fScore: heuristic(startNode, goalNode),  
    parent: null,  
  };

  aStarNodeMap.set(startId, startAStarNode);  
  openList.enqueue(startAStarNode, startAStarNode.fScore);

  while (!openList.isEmpty()) {  
    const currentAStarNode = openList.dequeue()!;  
    const currentNode = currentAStarNode.node;

    if (currentNode.data.id === goalId) {  
      // Goal reached, reconstruct path  
      const path: GraphNode \=;  
      let temp: AStarNode | null = currentAStarNode;  
      while (temp!== null) {  
        path.unshift(temp.node);  
        temp = temp.parent;  
      }  
      return path;  
    }

    closedList.add(currentNode.data.id);

    for (const edge of currentNode.edges) {  
      const neighborId = edge.destinationId;  
      if (closedList.has(neighborId)) {  
        continue;  
      }

      const neighborNode = graph.getNode(neighborId);  
      if (!neighborNode) {  
        continue;  
      }

      const tentativeGScore = currentAStarNode.gScore + edge.weight;

      let neighborAStarNode = aStarNodeMap.get(neighborId);

      if (!neighborAStarNode |

| tentativeGScore < neighborAStarNode.gScore) {  
        // This is a better path to the neighbor  
        if (!neighborAStarNode) {  
          neighborAStarNode = {  
            node: neighborNode,  
            gScore: tentativeGScore,  
            fScore: tentativeGScore + heuristic(neighborNode, goalNode),  
            parent: currentAStarNode,  
          };  
          aStarNodeMap.set(neighborId, neighborAStarNode);  
          openList.enqueue(neighborAStarNode, neighborAStarNode.fScore);  
        } else {  
          neighborAStarNode.parent = currentAStarNode;  
          neighborAStarNode.gScore = tentativeGScore;  
          neighborAStarNode.fScore = tentativeGScore + heuristic(neighborNode, goalNode);  
          // Note: A standard priority queue doesn't support decrease-key.  
          // A common workaround is to enqueue the node again with the new, lower priority.  
          // The algorithm will process the lower-priority one first.  
          // Our simple check \`if (closedList.has(neighborId))\` handles duplicates.  
          openList.enqueue(neighborAStarNode, neighborAStarNode.fScore);  
        }  
      }  
    }  
  }

  // No path found  
  return null;  
}

This implementation provides a solid, efficient, and type-safe foundation for the Global Path Planner. It correctly uses a priority queue for the open list and manages node metadata to reconstruct the optimal path upon completion.

Section 3: The Self-Healing Plan: Dynamic Adaptation via Reinforcement Learning

While the A* algorithm provides an excellent, globally optimal static plan, learning itself is a messy, non-linear, and dynamic process. A pre-computed path, no matter how well-designed, cannot anticipate a learner's specific struggles, breakthroughs, or moments of disengagement. To create a truly adaptive and "self-healing" system, a second layer of intelligence is required—one that can make localized, real-time adjustments to the plan based on the learner's immediate state. Reinforcement Learning (RL) provides a powerful and theoretically sound framework for this type of sequential decision-making under uncertainty.29

3.1. Conceptual Framework: Reinforcement Learning for Instructional Sequencing

The core idea is to treat the tutoring process as a series of decisions. At each point in the learning journey, the system (the "agent") observes the student's state and must choose a pedagogical action to take. The goal is to learn a "policy"—a strategy for choosing actions—that maximizes a cumulative "reward" over the long term, with the reward being tied to student learning.31
Instead of using RL to generate the entire learning path from scratch, which can be highly sample-inefficient and struggle with the "cold start" problem 19, we employ a more practical hybrid approach. The A* planner provides the long-term strategic direction, and the RL agent provides the short-term tactical adjustments. At each step
t of the A*-generated plan, the RL agent observes the learner's state and decides on the best action. This action might be to follow the plan, but it could also be a deviation designed to better meet the learner's immediate needs. This makes the system robust from day one while allowing it to become progressively more personalized as it gathers data.

3.2. Defining the RL Components (Markov Decision Process - MDP)

To apply RL, we must first formally define the learning environment as a Markov Decision Process (MDP). An MDP is a mathematical framework for modeling decision-making and is defined by a tuple (S,A,P,R,γ).8

State (S): The state, st, is a snapshot of all relevant information about the environment at time t. A rich and informative state representation is crucial for the agent to make good decisions.9 Our state vector will be a concatenation of several features:
- Learner Knowledge State: The vector of BKT probabilities [p(L1),p(L2),...,p(Lk)] for the concepts most relevant to the current point in the plan. This is the most critical component of the state, as it captures the learner's inferred mastery.
- Recent Performance History: A short-term memory of recent interactions, such as a binary vector representing success/failure on the last 3-5 activities.
- Current Plan Context: Information about the currently planned Learning Resource, LRt, from the A* path, including its type (Video, Quiz), complexity, and the concept it addresses.
- Interaction Features: Metrics from the last interaction, such as time_on_task, attempts_made, and hints_used.
Action (A): The action, at, is the pedagogical move the agent chooses to make from state st. The action space should include a variety of interventions that a human tutor might employ:
- Proceed: Present the next Learning Resource as specified in the A* plan. This is the default action.
- Substitute(LR_type, difficulty): Replace the planned resource with another one targeting the same concept but with different characteristics (e.g., substitute a difficult exercise with an easier explanatory video).
- Remediate(prerequisite_concept): Pause the current plan and insert a new activity to reinforce a prerequisite concept where the learner's BKT mastery p(L) is low.
- Skip: If the learner's BKT mastery p(L) for the target concept is already high (e.g., > 0.95), skip the planned resource and move to the next step in the A* plan.
- ProvideHint: Offer a hint for the current activity.
Reward (R): The reward function, R(s,a,s′), provides the feedback signal that guides the agent's learning. The design of the reward function is the most critical element, as it codifies the system's pedagogical philosophy. A naive reward, such as +1 for a correct answer, can lead to perverse strategies, like the agent only presenting trivial questions.33 A robust reward function must balance multiple objectives to encourage deep, efficient learning. We propose a composite reward function:
Rt=w1⋅Δp(L)+w2⋅Engagement−w3⋅Effort−w4⋅Struggle

Where:
- Δp(L): The change in the BKT probability of mastery for the target concept after the action. This is the primary component, directly rewarding demonstrable learning gain.
- Engagement: A small positive reward for completing an activity, or a negative reward for skipping or abandoning it.
- Effort: A small penalty proportional to the time taken to complete the activity, encouraging efficiency.
- Struggle: A penalty for requiring multiple hints or failed attempts on a single resource.
  The weights (w1,w2,w3,w4) are crucial hyperparameters that allow educators to tune the system's pedagogical style. A high w1 creates a "mastery-focused" tutor, while a high w2 creates a more "motivational" tutor.
Transition Probability (P) and Discount Factor (γ):
- The transition probability P(s′∣s,a) is the probability of moving to state s′ after taking action a in state s. In our system, this function is defined by the environment itself (i.e., the student's interaction) and does not need to be explicitly modeled by the agent.
- The discount factor, $\gamma \in $, determines the importance of future rewards. A value close to 1 makes the agent farsighted, prioritizing actions that lead to the best long-term learning outcomes, even at the cost of immediate reward. A value close to 0 makes it myopic, focusing only on the next reward.9 For educational applications, a high
  γ (e.g., 0.9 to 0.99) is appropriate.

3.3. Implementation Strategy: Deep Q-Network (DQN) for the Tutoring Agent

The state space defined above is large and continuous, making it impossible to use traditional tabular RL methods like Q-learning. Therefore, we will use a function approximator to estimate the optimal action-value function, Q∗(s,a), which represents the maximum expected cumulative reward achievable from state s by taking action a. A Deep Q-Network (DQN), a type of deep neural network, is a powerful and widely used choice for this task.9

Architecture: The DQN will be a multi-layer perceptron (MLP) that takes the state vector st as input and outputs a vector of Q-values, one for each possible action in our discrete action space A.
Policy: The agent's policy will be ϵ-greedy. Most of the time, it will choose the action with the highest Q-value: at=argmaxaQ(st,a;θ). With a small probability ϵ, it will choose a random action to encourage exploration.
Training: The network will be trained "offline" using a large dataset of historical student interactions from the platform. This is a practical necessity in education, where online, real-time exploration can be slow and potentially detrimental to the student experience.33 The training process uses techniques like
experience replay to break correlations in the data. The network's weights, θ, are updated by minimizing a loss function based on the Bellman equation, which aims to reduce the temporal-difference error between the predicted Q-value and a target value 9:
Loss(θ)=E[(yt−Q(st,at;θ))2]

where the target yt is calculated as:

yt=Rt+γa′maxQ(st+1,a′;θ−)

Here, θ− represents the weights of a separate, slow-updating target network, a technique that improves training stability.

This RL-based adaptation layer provides the crucial "self-healing" property. When the BKT model detects that a student is struggling (i.e., their p(L) is not increasing as expected), the state changes, and the RL agent's policy can learn to intervene by recommending a remedial or alternative action, deviating from the A* plan to provide personalized, just-in-time support.15

Section 4: Data Ecosystem and Feature Engineering

The performance of the hybrid A*/RL tutoring model is fundamentally dependent on the quality and richness of the data that fuels it. The system operates within a dynamic data ecosystem, where it not only consumes data to make decisions but also generates new data through its interactions with the learner. This creates a continuous feedback loop that allows the system to refine its models and improve its personalization over time.12 This section details the data required to enhance the system's models and the methods for processing this data into actionable features.

4.1. Enhancing the Student Model: Additional Data Requirements

The standard BKT model, while powerful, relies solely on the binary correctness of student responses. To build a more holistic and accurate Student Model, we must capture a wider range of data that provides insight into the learner's cognitive and metacognitive processes.

Metacognitive Feedback: A student's self-assessment of their understanding is a valuable, often overlooked, data point. Research has shown that combining metacognitive inputs with knowledge tracing can lead to more accurate models and better learning outcomes.35 After completing a quiz or exercise, the system should prompt the learner with a simple question, such as "How confident are you in your understanding of this topic?" on a 1-5 scale. This data helps disambiguate performance:
- A correct answer with low confidence might indicate a lucky guess.
- An incorrect answer with high confidence might indicate a deep-seated misconception that requires targeted remediation.
  This confidence score can be incorporated as a feature in the RL agent's state vector or used to adjust the BKT parameters themselves.
Fine-Grained Interaction Data: The "digital exhaust" from a learner's interaction with a resource provides a rich stream of behavioral signals. The system should log the following for each interaction 14:
- Time-on-Task: How long a student spends on a video, exercise, or quiz. Unusually long times may indicate struggle, while very short times may indicate disengagement or prior mastery.
- Number of Attempts: For quizzes and exercises, tracking the number of tries before a correct answer is a direct measure of difficulty and persistence.
- Hint Usage: The frequency and timing of hint requests are strong indicators that a learner is in their Zone of Proximal Development—challenged but not overwhelmed.37
- Video Analytics: For video lessons, tracking events like pausing, rewinding, re-watching sections, or changing playback speed can pinpoint areas of confusion or high interest.
  These behavioral metrics should be included in the RL agent's state vector, providing it with a much more nuanced picture of the learner's engagement and struggle than a simple correct/incorrect signal.
Learner Traits and Preferences: While more complex to implement reliably, gathering data on stable learner characteristics can inform the initial A* path generation. This can be done through an initial onboarding survey.38 Potential data points include:
- Learning Style Preferences (e.g., VAK - Visual, Auditory, Kinesthetic): A learner who prefers visual content might have paths weighted to favor videos and diagrams over text-heavy resources.37
- Stated Goals and Motivation: Understanding why a student is learning (e.g., for a specific job, for curiosity) can help tailor the types of examples and projects recommended.40
  
  It is important to note, however, that the predictive power of these traits can be limited and confounded by factors like motivation and diligence. One study found that a model using such traits achieved only 69.23% accuracy in path recommendations.39 Therefore, these should be considered secondary inputs, with the dynamic, real-time performance data from BKT and user interactions remaining the primary drivers of adaptation.

4.2. Enhancing the Domain Model: A Multi-Criteria Resource Weighting System

The "cost" of a learning resource, used by the A* algorithm's g(n) function, is a critical parameter that dictates the shape of the generated learning path. A simplistic cost, such as using only the community-provided complexity score, is insufficient. To make truly intelligent planning decisions, the system needs a more nuanced, multi-criteria weighting model that reflects the multifaceted nature of learning effort and effectiveness.41
We define the weight w(LR) for each Learning Resource as a tunable, weighted sum of several factors. This approach makes the cost function transparent and allows system designers to balance different pedagogical priorities. The final weight is a measure of "effective effort"—we want to find the path that minimizes this value. Therefore, desirable traits (like high engagement) will have negative coefficients to reduce the cost, while undesirable traits (like high complexity) will have positive coefficients to increase it.
The proposed formula is:
w(LR)=c1⋅Complexity+c2⋅CognitiveLoad+c3⋅Time−c4⋅Quality−c5⋅Interactivity−c6⋅Engagement
The following table details each component of this weighting system.
Table 1: Multi-Criteria Weighting for Educational Resources

Criterion	Data Source	Description	Impact on Cost w(LR)
Intrinsic Complexity	Community-driven scores; NLP analysis of text readability (e.g., Flesch-Kincaid).	The inherent difficulty of the conceptual content being presented.	Positive (higher complexity = higher cost). The coefficient c1 scales this factor.
Cognitive Load	Heuristic score based on Mayer's Principles (see Sec 5.2); Expert review.	A measure of how well the resource is designed to minimize extraneous cognitive load. A high score means poor design (e.g., cluttered, redundant).	Positive (poor design = higher cost). The coefficient c2 penalizes badly designed content.
Average Time-on-Task	Platform Analytics (mean or median time spent by all users).	The average time learners spend actively engaged with this resource.	Positive (longer time = higher cost). The coefficient c3 reflects the time investment.
Pedagogical Quality	Expert ratings; Learner satisfaction surveys (e.g., LSS, NPS).43	An overall measure of instructional quality, including clarity, accuracy, and relevance of examples.	Negative (higher quality = lower cost). The coefficient c4 rewards well-crafted content.
Interactivity Level	Resource metadata (e.g., Quiz=3, Exercise=3, Video=1, Flashcard=2).	The degree of active cognitive participation required from the learner. Active learning is generally more effective.	Negative (higher interactivity = lower cost). The coefficient c5 incentivizes active learning resources.
Engagement Score	Platform Analytics (e.g., completion rates, user ratings, thumbs up/down).	A data-driven measure of how engaging and motivating learners find the resource.	Negative (higher engagement = lower cost). The coefficient c6 favors resources that learners enjoy and complete.

This multi-criteria approach transforms the abstract notion of "learning effort" into a concrete, data-driven, and tunable formula. It provides a mechanism for explainability, allowing educators to understand why a particular path was chosen.43

4.3. Mapping User Objectives via Natural Language Processing (NLP)

A key input to the system is the user's own learning objective. This is often expressed in natural language, such as "I want to become a data scientist" or "I need to learn Python for web development".38 The system must be able to parse this free-text input and map it to one or more concrete
goal nodes within the knowledge graph. This is a task for Natural Language Processing (NLP).
The proposed pipeline for this mapping is as follows:

Text Preprocessing: The user's input string is normalized through standard NLP techniques like lowercasing, stop-word removal, and lemmatization to create a clean set of terms.44
Entity and Concept Extraction: The system uses NLP models to identify key terms and concepts within the processed text. This can range from simple keyword matching to more sophisticated techniques like Named Entity Recognition (NER) trained to identify skills, technologies, and job roles relevant to the domain.44 For "learn Python for web development," the key entities would be "Python" and "web development."
Knowledge Graph Matching: The extracted entities are then matched against the nodes in the knowledge graph. This is not a simple string comparison but a semantic one. Each node in the graph should have a rich description, and both the extracted entities and the node descriptions can be converted into high-dimensional vector representations using pre-trained word embeddings (like GloVe or Word2vec) or sentence embeddings (like BERT).1 The system can then calculate the cosine similarity between the user's query vector and all potential goal node vectors in the graph.
similarity=∣∣Q∣∣⋅∣∣N∣∣Q⋅N

Where Q is the vector for the user's query and N is the vector for a graph node.
Goal Disambiguation and Confirmation: The system presents the user with the top-matching concept nodes as potential goals. For example, for "web development," it might suggest "Front-End Web Development," "Back-End Web Development," and "Full-Stack Web Development" as potential goal nodes. The user confirms their choice, providing a precise, machine-readable goalId for the A* search algorithm. This process effectively translates a fuzzy human intention into a concrete computational target.47

This entire data ecosystem operates as a continuous learning cycle. The system's actions generate new data (performance logs, interaction metrics), which are used to update the Student Model (BKT probabilities) and the Domain Model (the weights in Table 1). These refined models, in turn, lead to more accurate and personalized future recommendations from the Tutoring Model. This dynamic, two-way flow of information is the essence of a truly adaptive system.

Section 5: Recommendations for Synergistic Content Design

The most sophisticated adaptive algorithm cannot overcome the limitations of poorly designed educational content. The effectiveness of the proposed hybrid A*/RL system is not solely a function of its computational elegance; it is inextricably bound to the pedagogical quality of the learning resources it orchestrates. An investment in the adaptive engine must be matched by an equivalent investment in creating and curating content that is designed to work synergistically with the algorithm. This section provides three foundational, evidence-based principles for content design that will maximize the learning outcomes produced by the system.

5.1. Principle 1: Design for Retention with Spaced Repetition

Rationale: A fundamental and well-documented challenge in learning is the "forgetting curve," first described by Hermann Ebbinghaus. Information encountered only once is rapidly forgotten.48 To build robust, long-term memory, learners must be exposed to information multiple times, with increasing intervals between each review. This technique, known as Spaced Repetition, is one of the most powerful and scientifically validated methods for improving knowledge retention.49
Recommendation: The system should not treat learning as a series of "one-and-done" events. For content that relies on the memorization of discrete facts (e.g., vocabulary, historical dates, scientific definitions, mathematical formulas), a Spaced Repetition System (SRS) should be integrated directly into the learning flow.
Implementation Strategy:

Identify Suitable Content: Learning resources of type Flashcard are ideal candidates for SRS. When a learner completes a flashcard deck, each individual card (a question-answer pair) should be added to a personal SRS queue for that learner.
Select an SRS Algorithm: A well-established and effective algorithm like SM-2 (from the SuperMemo family) can be implemented.49 The SM-2 algorithm adjusts the "inter-repetition interval" for each card based on the learner's self-reported ease of recall. After a review, the learner grades their performance (e.g., on a scale from 0-5). The algorithm then calculates the next review date for that specific card.
Integrate Reviews into the Learning Plan: The system's daily or weekly learning plan should be a dynamic mix of new content (from the A* path) and review items (from the SRS scheduler). The Tutoring Model can treat a "review session" as a specific type of activity, interleaving it with the primary learning path to create a continuous knowledge retention campaign.50 This transforms learning from a process of simply covering material to one of actively maintaining and strengthening knowledge over time.

5.2. Principle 2: Design for Comprehension by Managing Cognitive Load

Rationale: A learner's ability to process new information is constrained by the limited capacity of their working memory. Cognitive Load Theory posits that learning is hampered if the total mental effort required exceeds this capacity.51 This load can be broken down into three types: intrinsic (inherent difficulty of the material), germane (effort dedicated to deep learning and schema construction), and extraneous (effort wasted on processing confusing or irrelevant elements of the instruction). While we cannot always reduce intrinsic load, we can and must minimize extraneous load.52 A brilliantly accurate and well-sequenced learning resource will be ineffective if its presentation is cluttered, confusing, or distracting.
Recommendation: All multimedia learning resources, especially videos, animations, and complex diagrams, must be designed in accordance with Richard Mayer's evidence-based Principles of Multimedia Learning. These 12 principles are a set of guidelines specifically designed to reduce extraneous cognitive load and promote germane processing by leveraging how the human brain processes information.51
Key Principles for Application:

Coherence Principle: Exclude all extraneous words, pictures, and sounds. If a graphic, sound effect, or background music does not directly support the learning objective, remove it. Simplicity is key.51
Redundancy Principle: People learn better from graphics and narration than from graphics, narration, and redundant on-screen text. Presenting the same verbal information in both audio (narration) and visual (text) form simultaneously can overload the visual channel and force the learner to waste effort reconciling the two streams. Let the narration explain the visuals.51
Contiguity Principle (Spatial and Temporal): Present corresponding words and pictures near each other on the screen (spatial contiguity) and at the same time (temporal contiguity). Forcing learners to split their attention between a graphic on one side of the screen and its explanation on the other increases extraneous cognitive load.51
Segmenting Principle: Break down a long, continuous lesson into smaller, user-paced segments. This allows learners to manage their own cognitive load by digesting one chunk of information before moving to the next. This principle aligns perfectly with the system's modular design, where the knowledge graph is composed of discrete, bite-sized learning resources.51
Modality Principle: Present words as audio narration rather than on-screen text when accompanied by a graphic or animation. This distributes the processing load across both the visual and auditory channels, preventing the visual channel from becoming a bottleneck.51

By embedding these principles into the content creation guidelines, we ensure that the resources recommended by the algorithm have the best possible chance of being understood and learned effectively.

5.3. Principle 3: Design for Mastery by Aligning Content to Objectives

Rationale: Not all learning is the same. Memorizing a definition is a different cognitive task than applying a complex procedure or creating a novel solution. A robust educational system must use the right tool for the job. Bloom's Taxonomy provides a hierarchical framework for classifying learning objectives based on their cognitive complexity, ranging from lower-order thinking skills (Remembering, Understanding) to higher-order thinking skills (Applying, Analyzing, Evaluating, Creating).53
Recommendation: To ensure pedagogical effectiveness, there must be a clear and explicit alignment between the cognitive level of a learning objective and the type of learning resource used to teach and assess it. The system should not, for example, recommend simple flashcards when the learning objective is to "analyze" a complex scenario.54
Implementation Strategy:
This alignment can be operationalized by tagging each Learning Resource node in the knowledge graph with the primary Bloom's level it targets. This information can then be used by the Tutoring Model to make more intelligent decisions. The following table provides a framework for this mapping.
Table 2: Mapping Resource Types to Cognitive Objectives (Bloom's Taxonomy)

Bloom's Level	Verbs (for Objectives) 53	Learning Objective Example	Optimal Resource Type(s)	Rationale
Remember	Define, List, Recall, Name	"Recall the four parameters of Bayesian Knowledge Tracing."	Flashcards (SRS), Simple multiple-choice quiz.	These tasks require the simple retrieval of discrete facts from memory. SRS is ideal for building and maintaining this recall ability.49
Understand	Explain, Summarize, Paraphrase	"Explain in your own words how the A* algorithm uses a heuristic."	Video Lesson, Explanatory text with diagrams, Concept map activity.	Requires the learner to build a coherent mental model and connect ideas. Multimedia resources designed with Mayer's principles are highly effective here.51
Apply	Use, Solve, Implement, Demonstrate	"Use the provided TypeScript code to find the shortest path in a given graph."	Interactive Exercise, Coding challenge, Problem set with guided feedback.	Requires the active use of knowledge and procedures in a concrete situation. Practice with feedback is essential.56
Analyze	Distinguish, Compare, Differentiate	"Analyze two different learning paths and identify their respective pedagogical trade-offs."	Case Study, Data analysis project, Comparative simulation.	Requires breaking down information into its constituent parts and understanding the relationships between them. This demands more complex, context-rich resources.54
Evaluate	Justify, Critique, Recommend, Defend	"Evaluate the effectiveness of a proposed RL reward function and justify your assessment."	Peer review activity, Project with a detailed rubric, Debate forum.	Involves making and defending judgments based on a set of criteria. Social and collaborative tools can be effective.
Create	Design, Formulate, Build, Compose	"Design a novel heuristic function for the A* algorithm tailored to language learning."	Capstone project, Open-ended "sandbox" simulation, Portfolio assignment.	The highest level of cognition, requiring learners to synthesize knowledge to produce a new, original work.

By implementing this mapping, the system ensures that its recommendations are not just algorithmically optimal but also pedagogically sound. When the RL agent considers a Substitute or Remediate action, it can filter potential resources based on their appropriateness for the cognitive level of the current learning objective, dramatically improving the quality of its adaptations. The algorithm and the content thus work in a tight, synergistic loop, with each enhancing the effectiveness of the other.

Conclusion

This report has detailed a comprehensive, multi-layered framework for the automated generation and dynamic adaptation of personalized educational plans. By grounding the system in the robust, modular architecture of Intelligent Tutoring Systems, we have presented a design that is both theoretically sound and practically implementable. The proposed hybrid Tutoring Model, which synergistically combines a global A* path planner with a local Reinforcement Learning adaptation agent, directly addresses the dual challenge of providing long-term structure while maintaining moment-to-moment responsiveness.
The core contributions of this framework are threefold:

Architectural Coherence: It integrates distinct, state-of-the-art AI techniques into a cohesive whole. The Knowledge Graph serves as the structured Domain Model, Bayesian Knowledge Tracing provides a nuanced, probabilistic Student Model, and the hybrid A*/RL engine forms a sophisticated Tutoring Model. The system's intelligence is an emergent property of the seamless interaction between these specialized components.
Pedagogical Grounding: The design is not driven by technology alone but is deeply informed by principles of learning science. The A* algorithm's cost function is a multi-criteria model reflecting pedagogical priorities. The RL agent's reward function is a codified representation of the desired teaching philosophy, optimizing for long-term learning gain over shallow performance. Furthermore, the recommendations for content design—based on Spaced Repetition, Cognitive Load Theory, and Bloom's Taxonomy—ensure that the learning materials are designed to maximize the algorithm's effectiveness.
Practical Implementation Blueprint: The report provides not only the high-level architecture and mathematical formalisms but also production-ready TypeScript code for the core data structures and pathfinding algorithm. It specifies the data ecosystem required, identifying key metrics to collect and outlining an NLP pipeline for translating user goals into machine-readable targets. This provides a clear and actionable path from concept to deployment.

Ultimately, this framework presents a solution that is more than the sum of its parts. The A* planner provides a strong initial scaffolding that solves the "cold start" and sample inefficiency problems inherent in pure RL systems. The RL agent, in turn, provides the "self-healing" capability that a static planner lacks, using the rich, probabilistic signals from the BKT model to make fine-grained, personalized adjustments. This synergy creates a powerful learning engine capable of delivering educational experiences that are structured, efficient, deeply personalized, and perpetually adaptive to the unique journey of every learner. By building such a system, we move closer to the ultimate goal of educational technology: providing a scalable, high-quality, one-on-one tutoring experience for all.

Works cited

Generation of Personalized Knowledge Graphs Based on GCN, accessed August 4, 2025, https://www.scirp.org/journal/paperinformation?paperid=112519
AI Intelligent Tutoring Systems | Vention AI, accessed August 4, 2025, https://ventionai.com/education/intelligent-tutoring-systems
AI in Education: The Rise of Intelligent Tutoring Systems - Park University, accessed August 4, 2025, https://www.park.edu/blog/ai-in-education-the-rise-of-intelligent-tutoring-systems/
(PDF) Personalized Learning Path Recommendation based on ..., accessed August 4, 2025, https://www.researchgate.net/publication/381451150_Personalized_Learning_Path_Recommendation_based_on_Knowledge_Graph
Personalized Learning Path Recommendation based on Knowledge Graph - Warwick Evans Publishing, accessed August 4, 2025, https://wepub.org/index.php/TSSEHR/article/download/1939/2147/8706
Bayesian knowledge tracing - Wikipedia, accessed August 4, 2025, https://en.wikipedia.org/wiki/Bayesian_knowledge_tracing
A* search algorithm - Wikipedia, accessed August 4, 2025, https://en.wikipedia.org/wiki/A*_search_algorithm
(PDF) Smart E-Learning Framework For Personalized Adaptive Learning and Sequential Path Recommendations Using Reinforcement Learning - ResearchGate, accessed August 4, 2025, https://www.researchgate.net/publication/373153465_Smart_E-Learning_Framework_For_Personalized_Adaptive_Learning_and_Sequential_Path_Recommendations_Using_Reinforcement_Learning
Deep Reinforcement Learning for Personalized Recommendation of Distance Learning, accessed August 4, 2025, https://www.researchgate.net/publication/332106151_Deep_Reinforcement_Learning_for_Personalized_Recommendation_of_Distance_Learning
Intelligent Tutoring Systems | ACT-R, accessed August 4, 2025, http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/12/173Chapter_37_Intelligent_Tutoring_Systems.pdf
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges - arXiv, accessed August 4, 2025, https://arxiv.org/html/2507.18882
A Systematic Review of the Role of Learning Analytics in Supporting Personalized Learning, accessed August 4, 2025, https://www.mdpi.com/2227-7102/14/1/51
Data Structures in TypeScript - Graph - DEV Community, accessed August 4, 2025, https://dev.to/ricardo_borges/data-structures-in-typescript-graph-551i
A Review of Data Mining in Personalized Education: Current Trends and Future Prospects - arXiv, accessed August 4, 2025, https://arxiv.org/pdf/2402.17236
Guiding (digital) learning paths: Intelligent tutoring systems » Lamarr-Blog, accessed August 4, 2025, https://lamarr-institute.org/blog/ai-education-intelligent-tutoring-systems/
Logistic Knowledge Tracing Tutorial: Practical Educational Applications, accessed August 4, 2025, https://educationaldatamining.org/edm2024/proceedings/2024.EDM-tutorials.127/index.html
[2012.12218] BKT-LSTM: Efficient Student Modeling for knowledge tracing and student performance prediction - arXiv, accessed August 4, 2025, https://arxiv.org/abs/2012.12218
Bayesian Knowledge Tracing, accessed August 4, 2025, https://www.cs.williams.edu/~iris/res/bkt-balloon/index.html
A reinforcement learning approach to personalized learning recommendation systems | Request PDF - ResearchGate, accessed August 4, 2025, https://www.researchgate.net/publication/327597979_A_reinforcement_learning_approach_to_personalized_learning_recommendation_systems
Shortest path - Student Academic Success - Monash University, accessed August 4, 2025, https://www.monash.edu/student-academic-success/mathematics/graphs-and-networks/shortest-path
The A* Algorithm: A Complete Guide | DataCamp, accessed August 4, 2025, https://www.datacamp.com/tutorial/a-star-algorithm
Introduction to A* - Stanford CS Theory, accessed August 4, 2025, http://theory.stanford.edu/~amitp/GameProgramming/AStarComparison.html
A* Search Algorithm - GeeksforGeeks, accessed August 4, 2025, https://www.geeksforgeeks.org/dsa/a-search-algorithm/
TypeScript: JavaScript With Syntax For Types., accessed August 4, 2025, https://www.typescriptlang.org/
TypeScript Learning Path - Add Types to Your Web and Node.js Apps with TypeScript, accessed August 4, 2025, https://frontendmasters.com/learn/typescript/
Advanced Graph Algorithms in TypeScript | CodeSignal Learn, accessed August 4, 2025, https://codesignal.com/learn/courses/interview-prep-the-last-mile-in-typescript/lessons/advanced-graph-algorithms-in-typescript
evilkiwi/astar: Synchronous A* pathfinding for TypeScript - GitHub, accessed August 4, 2025, https://github.com/evilkiwi/astar
digitsensitive/astar-typescript: A* search algorithm in TypeScript - GitHub, accessed August 4, 2025, https://github.com/digitsensitive/astar-typescript
Reinforcement Learning for Personalized Education: Review ..., accessed August 4, 2025, https://www.jotverse.com/reinforcement-learning-for-personalized-education-review/
Reinforcement Learning in Education: A Literature Review - MDPI, accessed August 4, 2025, https://www.mdpi.com/2227-9709/10/3/74
Reinforcement Learning - GeeksforGeeks, accessed August 4, 2025, https://www.geeksforgeeks.org/machine-learning/what-is-reinforcement-learning/
Deep reinforcement learning for personalized treatment recommendation - PMC, accessed August 4, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9427729/
Leveraging Deep Reinforcement Learning for Pedagogical Policy Induction in an Intelligent Tutoring System - ERIC, accessed August 4, 2025, https://files.eric.ed.gov/fulltext/ED599215.pdf
Personalized Education Through Reinforcement Learning - Nested.ai, accessed August 4, 2025, https://nested.ai/2024/07/07/personalized-education-through-reinforcement-learning/
Knowledge tracing for adaptive learning in a metacognitive tutor - ResearchGate, accessed August 4, 2025, https://www.researchgate.net/publication/360633331_Knowledge_tracing_for_adaptive_learning_in_a_metacognitive_tutor
An improved adaptive learning path recommendation model driven by real-time learning analytics - PMC, accessed August 4, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9748379/
Game-Based Adaptive Learning in Probability Education, accessed August 4, 2025, https://www.ijiet.org/vol15/IJIET-V15N1-2212.pdf
How to Build a Personalized Learning Plan for Employees - Disprz, accessed August 4, 2025, https://disprz.ai/blog/personalized-learning-plan
Learning Performance in Adaptive Learning Systems: A Case Study of Web Programming Learning Recommendations - PMC - PubMed Central, accessed August 4, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC8831801/
How Personalized Learning Plans are Transforming Tech Education and How StackRoute Learning Understands It, accessed August 4, 2025, https://www.stackroutelearning.com/how-personalized-learning-plans-are-transforming-tech-education-and-how-stackroute-learning-understands-it/
A multi-component model for assessing learning objects: The ..., accessed August 4, 2025, https://ajet.org.au/index.php/AJET/article/view/1192
(PDF) Multi-Criteria Decision Analysis Approaches for Selecting and ..., accessed August 4, 2025, https://www.researchgate.net/publication/309445487_Multi-Criteria_Decision_Analysis_Approaches_for_Selecting_and_Evaluating_Digital_Learning_Objects
How to Evaluate Adaptive Learning Systems: The Metrics That Matter - Adaptemy, accessed August 4, 2025, https://www.adaptemy.com/how-to-evaluate-adaptive-learning-systems-the-metrics-that-matter/
Build a Knowledge Graph in NLP - GeeksforGeeks, accessed August 4, 2025, https://www.geeksforgeeks.org/nlp/build-a-knowledge-graph-in-nlp/
Natural Language Processing and its Use in Education - The Science and Information (SAI) Organization, accessed August 4, 2025, https://thesai.org/Downloads/Volume5No12/Paper_10-Natural_Language_Processing.pdf
KG-PLPPM: A Knowledge Graph-Based Personal Learning Path ..., accessed August 4, 2025, https://www.mdpi.com/2079-9292/14/2/255
5 examples of natural language processing in education | AI in education | NLP and education | Lumenalta, accessed August 4, 2025, https://lumenalta.com/insights/5-examples-of-natural-language-processing-in-education
Spaced Repetition: What It Is and How to Integrate It in Your Platform - StylemixThemes, accessed August 4, 2025, https://stylemixthemes.com/wp/spaced-repetition-what-it-is-and-how-to-integrate-it-in-your-platform/
Spaced repetition - Wikipedia, accessed August 4, 2025, https://en.wikipedia.org/wiki/Spaced_repetition
What is Spaced Repetition in Learning along with its benefits?, accessed August 4, 2025, https://www.neovation.com/learn/17-what-is-spaced-repetition-in-learning
Cognitive Theory of Multimedia Learning - Online@JSU, accessed August 4, 2025, https://www.jsu.edu/online/faculty/cognitive-theory-of-multimedia-learning.html
Cognitive Theory of Multimedia Learning (Chapter 3) - Cambridge University Press, accessed August 4, 2025, https://www.cambridge.org/core/books/cambridge-handbook-of-multimedia-learning/cognitive-theory-of-multimedia-learning/24E5AEDEC8F4137E37E15BD2BCA91326
Bloom's Taxonomy of Measurable Verbs, accessed August 4, 2025, https://www.utica.edu/academic/Assessment/new/Blooms%20Taxonomy%20-%20Best.pdf
matching items - EdTech Books, accessed August 4, 2025, https://edtechbooks.org/Assessment_Basics/matching_items?tab=download
Why Use Media to Enhance Teaching and Learning - SERC (Carleton), accessed August 4, 2025, https://serc.carleton.edu/sp/library/media/why.html
The Ultimate Guide to Instructional Design | Moodle, accessed August 4, 2025, https://moodle.com/us/news/guide-to-instructional-design/

Executive Summary​

Section 1: An Architectural Framework for Intelligent Tutoring​

1.1. The Core Components of an Adaptive System: The ITS Triad​

1.2. The Domain Model: A Semantically Rich Knowledge Graph​

1.3. The Student Model: Probabilistic Knowledge Tracing​

1.4. The Tutoring Model: A Hybrid Planning and Adaptation Engine​

Section 2: The Learning Path Generation Algorithm: A* Search​

2.1. Problem Formulation: Optimal Pathfinding in a Weighted Knowledge Graph​

2.2. The A* Algorithm: Mathematical Formulation for Education​

2.3. TypeScript Implementation of the A* Path Planner​

Section 3: The Self-Healing Plan: Dynamic Adaptation via Reinforcement Learning​

3.1. Conceptual Framework: Reinforcement Learning for Instructional Sequencing​

3.2. Defining the RL Components (Markov Decision Process - MDP)​

3.3. Implementation Strategy: Deep Q-Network (DQN) for the Tutoring Agent​

Section 4: Data Ecosystem and Feature Engineering​

4.1. Enhancing the Student Model: Additional Data Requirements​

4.2. Enhancing the Domain Model: A Multi-Criteria Resource Weighting System​

4.3. Mapping User Objectives via Natural Language Processing (NLP)​

Section 5: Recommendations for Synergistic Content Design​

5.1. Principle 1: Design for Retention with Spaced Repetition​

5.2. Principle 2: Design for Comprehension by Managing Cognitive Load​

5.3. Principle 3: Design for Mastery by Aligning Content to Objectives​

Conclusion​

Works cited​

Executive Summary

Section 1: An Architectural Framework for Intelligent Tutoring

1.1. The Core Components of an Adaptive System: The ITS Triad

1.2. The Domain Model: A Semantically Rich Knowledge Graph

1.3. The Student Model: Probabilistic Knowledge Tracing

1.4. The Tutoring Model: A Hybrid Planning and Adaptation Engine

**Section 2: The Learning Path Generation Algorithm: A* Search**

2.1. Problem Formulation: Optimal Pathfinding in a Weighted Knowledge Graph

**2.2. The A* Algorithm: Mathematical Formulation for Education**

**2.3. TypeScript Implementation of the A* Path Planner**

Section 3: The Self-Healing Plan: Dynamic Adaptation via Reinforcement Learning

3.1. Conceptual Framework: Reinforcement Learning for Instructional Sequencing

3.2. Defining the RL Components (Markov Decision Process - MDP)

3.3. Implementation Strategy: Deep Q-Network (DQN) for the Tutoring Agent

Section 4: Data Ecosystem and Feature Engineering

4.1. Enhancing the Student Model: Additional Data Requirements

4.2. Enhancing the Domain Model: A Multi-Criteria Resource Weighting System

4.3. Mapping User Objectives via Natural Language Processing (NLP)

Section 5: Recommendations for Synergistic Content Design

5.1. Principle 1: Design for Retention with Spaced Repetition

5.2. Principle 2: Design for Comprehension by Managing Cognitive Load

5.3. Principle 3: Design for Mastery by Aligning Content to Objectives

Conclusion

Works cited